Dissimilarity measures for histogram-valued data and divisive clustering of symbolic objects

نویسنده

  • Jaejik Kim
چکیده

Contemporary datasets are becoming increasingly larger and more complex, while techniques to analyse them are becoming more and more inadequate. Thus, new methods are needed to handle these new types of data. This study introduces methods to cluster histogram-valued data. However, histogram-valued data are difficult to handle computationally because observations typically have a different number and length of subintervals. Thus, a transformation for histogram data is proposed as a technique for handling them more easily computationally. From this technique, three new dissimilarity measures for histogram data are proposed. Then, how the monothetic clustering algorithm based on Chavent (1998, 2000) can be extended to histogram data is shown, and a polythetic clustering algorithm for symbolic objects is developed (based on all p variables). Validity criteria to aid in the selection of the optimal number of clusters are described and verified by some simulation studies. The new methodology is illustrated on a large dataset collected from the US Forestry Service. Index words: Symbolic data, Histogram-valued data, Dissimilarity measure, Monothetic algorithm, Polythetic algorithm, Validity Dissimilarity Measures for Histogram-valued Data and Divisive Clustering of Symbolic Objects

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relational partitioning fuzzy clustering algorithms based on multiple dissimilarity matrices

This paper introduces fuzzy clustering algorithms that can partition objects taking into account simultaneously their relational descriptions given by multiple dissimilarity matrices. The aim is to obtain a collaborative role of the different dissimilarity matrices to get a final consensus partition. These matrices can be obtained using different sets of variables and dissimilarity functions. T...

متن کامل

A New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal

The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...

متن کامل

A New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal

The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...

متن کامل

Divisive Monothetic Clustering for Interval and Histogram-valued Data

We consider some classically based methods for fitting a multiple regression model to intervalvalued data (de Carvalho et al., 2004; Lima Neto et al., 2005; Lima Neto and de Carvalho, 2010). Then, a so-called symbolic model is fitted where now the regression parameters are estimated by using the symbolic sample covariance and variance functions of Billard (2008) and Bertrand and Goupil (2000). ...

متن کامل

Analysis of Distribution Valued Dissimilarity Data

We deal with methods for analyzing complex structured data, especially, distribution valued data. Nowadays, there are many requests to analyze various types of data including spatial data, time series data, functional data and symbolic data. The idea of symbolic data analysis proposed by Diday covers a large range of data structures. We focus on distribution valued dissimilarity data and multid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009